point cloud
- North America > United States (0.28)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Materials (0.67)
- Government (0.45)
- North America > United States > Oregon (0.04)
- Asia > Middle East > Israel (0.04)
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models Zhimin Chen
Foundation models have achieved remarkable results in 2D and language tasks like image segmentation, object detection, and visual-language understanding. However, their potential to enrich 3D scene representation learning is largely untapped due to the existence of the domain gap. In this work, we propose an innovative methodology called Bridge3D to address this gap by pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our method employs semantic masks from foundation models to guide the masking and reconstruction process for the masked autoen-coder, enabling more focused attention on foreground representations.
- Asia > Middle East > Israel (0.04)
- Asia > Middle East > Jordan (0.04)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation Liyao T ang
This approach may, however, hinder the comprehensive exploitation of unlabeled data points. We hypothesize that this selective usage arises from the noise in pseudo-labels generated on unlabeled data. The noise in pseudo-labels may result in significant discrepancies between pseudo-labels and model predictions, thus confusing and affecting the model training greatly.
- North America > United States (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
CityRefer Datasheet We follow the guidelines of the datasheets for datasets [ 1 ] to explain the composition, collection, recommended use case, and other details of the CityRefer dataset
For what purpose was the dataset created? We created this CityRefer dataset to facilitate research toward city-scale 3D visual grounding. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Who funded the creation of the dataset? What do the instances that comprise the dataset represent? CityRefer contains descriptions for 3D visual grounding on large-scale point cloud data.
- Europe > United Kingdom > England > Staffordshire (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation (Supplementary Material)
Differently, our unsupervised multi-body task requires the model's ability to handle part-level local equivariance, Figure 1: Structure of our feature extractor based on EPN. "EPNConv" is the SE(3)-equivariant convolution proposed in the vanilla EPN network. Part-level SE(3)-equivariance is desirable for motion analysis, especially rotation estimation. Song and Y ang utilized the methodology proposed by Choy et al . All other objects were considered part of the background.
- Asia > Middle East > Israel (0.04)
- Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)